Massive disambiguation of large text corpora with flexible categorial grammar
نویسندگان
چکیده
A n6~ ~/~ of au~mtic l~u~/cal disa~i~ation of big t ~ is d~ri~, us~u~ recent p~ft/u~tical ~s~l~ f~ %/~ th~zyf of cat~rial ~an~.
منابع مشابه
Towards High Speed Grammar Induction on Large Text Corpora
In this paper we describe an e cient and scalable implementation for grammar induction based on the EMILE approach ([2], [3],[4], [5], [6]). The current EMILE 4.1 implementation ([11]) is one of the rst e cient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction research on various types of corpora...
متن کاملTranslating Treebank Annotation For Evaluation
In this paper we discuss the need for corpora with a variety of annotations to provide suitable resources to evaluate different Natural Language Processing systems and to compare them. A supervised machine learning technique is presented for translating corpora between syntactic formalisms and is applied to the task of translating the Penn Treebank annotation into a Categorial Grammar annotatio...
متن کاملVowel Sound Disambiguation for Intelligible Korean Speech Synthesis
For speech synthesis systems that transform text materials into voice data, correctness and naturalness are the crucial measures of performance, the latter gaining more emphasis recently. In order to make synthesized voices natural, we must take into account pronunciation-related linguistic phenomena such as homograph, among others. The syntax certainly provides an important clue to disambiguat...
متن کاملAcquisition of Large Scale Categorial Grammar Lexicons
A system is presented for inducing Categorial Grammar (CG) lexicons for natural language from either unannotated or minimally annotated corpora extracted from the Penn Treebank. A combination of symbolic and stochastic methods have been used to build a computationally e ective and psychologically plausible system, which learns linguistically useful lexicons. There are a variety of parameters in...
متن کاملCCG Syntactic Reordering Models for Phrase-based Machine Translation
Statistical phrase-based machine translation requires no linguistic information beyond word-aligned parallel corpora (Zens et al., 2002; Koehn et al., 2003). Unfortunately, this linguistic agnosticism often produces ungrammatical translations. Syntax, or sentence structure, could provide guidance to phrasebased systems, but the “non-constituent” word strings that phrase-based decoders manipulat...
متن کامل